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(57) A method and system for enhanced perform- 
ance multithread operation in a data processing system 
which includes a processor, a main memory store and 
at least two levels of cache memory. At least one instruc- 
tion within an initial thread is executed. Thereafter, the 
state of the processor at a selected point within the first 
thread is stored, execution of the first thread is terminat- 
ed and a second thread is selected for execution only 
in response to a level two or higher cache miss, thereby 
minimizing processor delays due to memory latency. 
The validity state of each thread is preferably main- 
tained in order to minimize the likelihood of returning to 
a prior thread for execution before the cache miss has 
been corrected. A least recently executed thread is pref- 
erably selected for execution in the event of a nonvalidity 
indication in association with all remaining threads, in 
anticipation of a change to the valid status of that thread 
prior to all other threads. A thread switch bit may also 
be utilized to selectively inhibit thread switching where 
execution of a particular thread is deemed necessary. 
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Description 

BACKGROUND OF THE INVENTION 

Technical Field 

The present invention relates in general to an im- 
proved data processing system and in particular to an 
improved high performance mullithread data process- 
ing system. Still more particularly the present invention 
relates to a method and system for reducing the impact 
of memory latency in a mullithread data processing sys- 
tem. 

Description of the Related Art 

Single tasking operating systems have been avail- 
able (or many years within computer systems. In such 
systems, a computer processor executes computer pro- 
grams or program subroutines serially, that is no com- 
puter program or program subroutine can begin to exe- 
cute until the previous computer program or program 
subroutine has terminated. This type of operating sys- 
tem does not make optimum use of the computer proc- 
essor in a case where an executing computer program 
or subroutine must await the occurrence of an external 
event (such as the availability of data or a resource) be- 
cause processor time is wasted. 

This problem has lead to the advent of operating 
systems. Each of the program threads performs a spe- 
cific task. While a computer processor can execute only 
one program thread at a time, if the thread being exe- 
cuted must wait for the occurrence of an external event, 
i.e., the thread becomes "non-dispatchable," execution 
of a non-dispatchable thread is suspended and the com- 
puter processor executes another thread of the same or 
different computer program to optimize utilization of 
processor assets. Multitasking operating systems have 
also been extended to multiprocessor environments 
where threads of the same or different programs can 
execute in parallel on different computer processors. 
While such multitasking operating systems optimize the 
use of one or more processors, they do not permit the 
application program developer to adequately influence 
the scheduling of the execution of threads. 

Previously developed hardware multithread proc- 
essors which maintain multiple states of different pro- 
grams and permit the ability to switch between those 
states quickly typically switch threads at every memory 
reference, cache miss or stall. Memory latencies in mod- 
ern microprocessors are too long and first level on-chip 
cache sizes are generally quite small. For example, in 
an objoct-oricntcd programming environment program 
locality is worse than in traditional environments. Such 
a situation r suits in incr ased delays due to incr ased 
memory access r ndering the data proc ssing system 
less cost-effective. 

Existing multithreading techniques describe switch- 



ing threads on a cache miss or a memory referenc . A 
primary example of this t chnique may be reviewed in 
"Sparcle: An Evolutionary Design for Large-Scale Mul- 
tiprocessors.* IEEE Micro Volume 13. No. 3. pp. 48-60. 

s June 1993. As applied in a so-called 'RISC" (reduced 
instructions set computing) architecture multiple regis- 
ter sets normally utilized to support function calls are 
modified to maintain multiple threads. Eight overlapping 
register windows are modified to become four non-over- 

io lapping register sets, wherein each register set is a re- 
serve for trap and message handling. This system dis- 
closes a thread switch which occurs on each first level 
cache miss that results in a remote memory request. 
While this system represents an advance in the art. 

is modern processor designs often utilize a multiple level 
cache or high speed memory which is attached to the 
processor. The processor system utilizes some well- 
known algorithm to decide what portion of its main mem- 
ory store will be loaded within each level of cache and 

20 thus, each lime a memory reference occurs which is not 
present within the first level of cache the processor must 
attempt to obtain that memory reference from a second 
or higher level of cache. 

It should thus be apparent that a need exists for an 

2$ improved data processing system which can reduce de- 
lays due to memory latency in a multilevel cache syst m 
utilized in conjunction with a multithread data process- 
ing system. 



It is therefore one object of the present invention to 
provide an improved data processing system. 

It is another object of the present invention to pro- 

35 vide an improved high performance multithread data 
processor system. 

It is yet another object of the present invention to 
provide an improved method and system for reducing 
delays due to memory latency in a multithread data 

40 processing system. 

The foregoing objects are achieved as is now de- 
scribed. A method and system are disclosed for en- 
hanced performance multithread operation in a data 
processing system which includes a processor, a main 

45 memory store and at least two levels of cache memory. 
At least one instruction within an initial thread is execut- 
ed. Thereafter, the state of the processor at a selected 
point within the first thread is stored, execution of the 
first thread is terminated and a second thread is select d 

50 for execution only in response to a level two or high r 
cache miss, thereby minimizing processor delays du 
to memory latency. The validity state of each thread is 
preferably maintained in order to minimizo the likelihood 
of returning to a prior thread for execution before th 

55 cache miss has been corr ct d. A least rec ntly ex cut- 
ed thread is preferably select d for ex cution in th 
vent of a nonvalidity indication in association with all 
remaining threads, in anticipation of a change to the val- 
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id status of that thread prior toall other threads. Athr ad 
switch bit may also be utilized to selectively inhibit 
thread switching where execution of the current thread 
is deemed necessary. 

The above as well as additional objectives, fea- 5 
tures, and advantages of the present invention will be- 
come apparent in the following detailed written descrip- 
tion. 

BRIEF DESCRIPTION OF THE DRAWINGS W 

The novel features believed characteristic of the in- 
vention are set forth in the appended claims. The inven- 
tion itself, however, as well as a preferred mode of use, 
further objectives and advantages thereof, will best be *5 
understood by reference to the following detailed de- 
scription of an illustrative embodiment when read in con- 
junction with the accompanying drawings, wherein: 

Figure 1 is a high level block diagram of a data 20 
processing system which may be utilized to imple- 
ment the method and system of the present inven- 
tion: 

Figure 2 is a high level logic flowchart of a process 2S 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates basic oper- 
ation in accordance with the method and system of 
the present invention; 

30 

Figure 3 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a simple 
prioritized thread management system in accord- 
ance with the method and system of the present in- 3S 
vention; 

Figure 4 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a preemp- 40 
tive prioritized thread management system in ac- 
cordance with the method and system of the 
present invention; 

Figure 5 is a high level logic flowchart of a process *s 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a first thread 
management system in accordance with the meth- 
od and system of the present invention; and 

so 

Figure 6 is a high level logic flowchart of a process 
which may be implemented within the data process- 
ing system of Figure 1 which illustrates a second 
thread management system in accordance with the 
method and system of the pr s nt inv ntion. 55 



816 A2 4 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENT 

With reference now to the figures and in particular 
with reference to Figure 1 , there is depicted a high level 
block diagram of a data processing system 10 which 
may be utilized to implement the method and system of 
the present invention. In a preferred embodiment proc- 
essor 12 of system 10 is a single integrated circuit su- 
perscalar microprocessor, which may be implemented 
utilizing any well-known superscalar microprocessor 
system such as the PowerPC Microprocessor manufac- 
tured by International Business Machines Corporation 
of Armonk, New York. As will be discussed below data 
processing system 10 preferably includes various units, 
registers, buffers, memories and other sections which 
are all preferably formed by integrated circuitry. As those 
skilled in the art will appreciate data processing system 
10 preferably operates according to reduced instruction 
set computing (RISC) techniques. 

As illustrated, data processing system 1 0 preferably 
includes a main memory store 14, a data cache 16 and 
instruction cache 18 which are interconnected utilizing 
various bus connections. Instructions from instruction 
cache 18 arc preferably output to instruction flow unit 
34 which, in accordance with the method and system of 
the present invention, controls the execution of multipl 
threads by the various subprocessor units within data 
processing system 10. Instruction flow unit 34 selective- 
ly outputs instructions to various execution circuitry with- 
in data processing system 10 including branch unit 26. 
fixed point unit 28, load/store unit 30 and floating point 
unit 32. 

In addition to the various execution units depicted 
within Figure 1 those skilled in the art will appreciate 
that modern superscalar microprocessor systems often 
include multiple versions of each such execution unit. 
Each of these execution units will have as an input 
source operand information from various registers such 
as general purpose registers 36 and floating point reg- 
isters 40. Additionally, multiple special purpose registers 
38 may be utilized in accordance with the method and 
system of the present invention to store processor state 
information in response to thread switching. 

In a manner well known to those having ordinary 
skill in the art in response to a load instruction load/store 
unit 30 will input information from data cache 16 and 
copy that information to selected buffers for ulilization 
by one of the plurality of execution units. Data cache 16 
is preferably a small memory which utilizes high sp d 
memory devices and which stores data which is consid- 
ered likely to be utilized frequently or in the near futur 
by data processing system. In accordanco with an im- 
portant feature of the present invention a second level 
each 20 is also provided which, in an inclusiv syst m, 
will includ all data stored within data cache 16 and a 
larger amount of data copied from main memory store 
14. L vel two each 20 is preferably a higher speed 
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memory system than main memory store 14 and, by 
storing selected data within level two cache 20 in ac- 
cordance with various well known techniques the mem- 
ory latency which occurs as a result of a reference to 
main memory store 14 can be minimized. 

A level two cache/memory interface 22 is also pro- 
vided in accordance with the method and system of the 
present invention. As illustrated: a bus 42 is provided 
between level two cache/memory interface 22 and in- 
struction flow unit 34 to indicate to instruction flow unit 
34 a miss within level two cache 20. That is, an attempt 
to access data within the system which is not present 
within level two cache 20. Further, a so-called "Transla- 
tion Lookaside Buffer" (TLB) 24 is provided which con- 
tains virtual-to-real address mapping. Although not illus- 
trated within the present invention various additional 
high level memory mapping butters may be provided 
such as a Segment Lookaside Buffer (SLB) which will 
operate in a manner similar to that described for trans- 
lalion lookaside buffer 24. 

Thus, in accordance with an important feature of the 
present invention delays due to memory latency within 
data processing system 10 may be reduced by switch- 
ing between multiple threads in response to the occur- 
rence of an event which indicates long memory latency 
may occur. In one embodiment of the system depicted 
within Figure 1 a thread switch will occur, if enabled, in 
response to a level two cache miss on a fetch. That is, 
an attempt by the processor to access the level two 
cache to determine whether or not a memory request 
can be satisfied and an indication that the desired data 
or instruction is not present within the level two cache. 
This occurrence is typically processed by causing a 
memory request to be retrieved from main memory store 
14 and the memory latency which occurs during this pe- 
riod triggers, in accordance with the method and system 
of the present invention, a thread switch. In alternate 
embodiments of the present invention a thread switch 
is triggered only in response to the occurrence of those 
events which will take a longer period of time to com- 
plete than is required to refill the instruction pipeline (typ- 
ically 5 or 6 cycles). Thus, a thread switch may be trig- 
gered in response to a Translation Lookaside Buffer 
(TLB) Miss or Invalidate, a Segment Lookaside Buffer 
(SLB) Miss or Invalidate, a failed conditional store oper- 
ation or other operation which require, on average, a pe- 
riod of time which is longer than the time required for a 
thread switch. By only switching threads in response to 
such events the necessity for increased complexity and 
replication of pipeline latches and additional pipeline 
states is avoided. 

A thread is accomplished, as described herein, by 
providing a thread state rogistor within a dedicated spe- 
cial purpose register 38. This thread state register pref- 
erably includes an indication of th current thr ad 
number, and indication of whether single-thread or mul- 
ti-thread operation is enabled and a validity indication 
bit for each thread. Thus, if four threads are permitted 



within data processing system 10. seven bits are re- 
quired to indicate this information. Additionally, two ex- 
isting special purpose registers are utilized as save-re- 
store registers to store the address of the instruction 
5 which caused the level two cache miss and store the 
machine state register. 

In accordance with the method and system of the 
present invention level two cache/memory interface 22 
preferably permits multiple outstanding memory re- 

io quests. That is, one outstanding memory request per 
thread. Thus, when a first thread is suspended in re- 
sponse to the occurrence of a level two cache miss a 
second thread would be able to access the level two 
cache for data present therein. If the second thread also 

*5 results in a level two cache miss another memory re- 
quest will be issued and thus multiple memory requests 
must be maintained within level two cache/memory in- 
terface 22. Funher, in order to minimize so-called 
■thrashing" the method and system of the present inven- 

20 lion requires that at least a first instruction within each 
thread must complete. Thus, if all threads within the sys- 
tem are awaiting a level two cache miss and the first 
thread is resumed it will not find the required data; how- 
ever, in response to a requirement that at least the first 

25 instruction must complete this thread will simply wait un- 
til the cache miss has been satisfied. 

Thus, those skilled in the art should appreciate that 
"multithreading," as defined within the present disclo- 
sure wherein multiple independent threads are execut- 

30 ing may be accomplished in hardware in accordance 
with the method and system of the present invention 
may be utilized to greatly reduce the delay due to mem- 
ory latency by maintaining the state of multiple threads 
(preferably two or three in accordance with the current 

35 design) and selectively switching between those 
threads only in response to a second level or higher 
cache miss. 

Referring now to Figure 2 there is depicted a high 
level logic flowchart of a process which may be imple- 

40 mented within the data processing system of Figur 1 
which illustrates basic operation in accordance with the 
method and system of the present invention. As depict- 
ed, the process begins at block 60 and thereafter passes 
to block 62. Block 62 illustrates the loading of all threads. 

^5 The process then passes to block 64 which depicts the 
setting of the current thread i = 0. Block 66 then depicts 
the execution of thread i until such time as the process 
passes to block 68. Block 68 illustrate the occurrence of 
a level two cache or translation lookaside buffer (TLB) 

so miss. In the event no such miss occurs the process re- 
turns, in an iterative fashion, to block 66 to continue to 
execute thread i. 

Roforring again to block 68 in the cvont a level two 
cache or translation lookaside buffer miss has occurr d, 

55 th process pass s to block 70. Block 70, in accordance 
with an important featur of th pr s nt invention, illus- 
trates a determination of wh ther or not thread switching 
within the system is nabled. Those having ordinary skill 
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in the art will appreciate that in selected instanc s exe- 
cution of a particular thread will be desirable and thus, 
the method and system of the present invention pro- 
vides a technique whereby the switching between mul- 
tiple threads may be disabled. In the event thread 
switching is not enabled the process passes from block 
70 back to block 66 in an iterative fashion to await the 
satisfaction of the level two cache miss. 

Referring again to block 70, in the event thread 
switching is enabled the process passes to block 72. 
Block 72 illustrates the saving of the state of instruction 
register and the machine state register for thread i uti- 
lizing the special purpose registers (see Figure 1) and 
the process then passes to block 74. Block 74 illustrates 
the changing of the current thread to the next thread by 
incrementing i, accessing the appropriate registers and 
the process then passes to block 76. Block 76 illustrates 
the setting of the thread state for the new current thread 
and the process then returns to block 66 in an iterative 
fashion. 

With reference now to Figure 3 there is depicted a 
high level logic flowchart which illustrates a process 
which may be implemented within the data processing 
system of Figure 1 which depicts a simple prioritized 
thread management system in accordance with the 
method and system of the present invention. As illus- 
trated, this process begins at block 80 and thereafter 
passes to block 82. Block 82 illustrates the loading of ail 
threads (0, n - 1 ) and the assignment of an associated 
priority for each thread. The process then passes to 
block 84 which depicts the setting of the current thread 
i equal to the thread having the highest priority. There- 
after, the process passes to block 86. 

Block 86 illustrates the execution of thread i and the 
process then passes to block 88. Block 88 illustrates a 
determination of whether or not a level two cache or 
translation lookaside buffer miss has occurred and if not, 
as above, the process returns to block 86 in an iterative 
fashion to continue to execute thread i. 

Still referring to block 88, in the event a level two 
cache or translation lookaside buffer miss has occurred 
the process passes to block 90. Block 90, as described 
above, illustrates a determination of whether or not 
thread switching is enabled, and if not, the process re- 
turns to block 86 in an iterative fashion. However, in the 
event thread switching is enabled, the process passes 
to block 92. 

Block 92 depicts the saving of the state of thread i 
and the marking of that thread as "NOT READY." There- 
after, the process passes to block 94. Block 94 depicts 
the concurrent processing of the switch event and the 
marking of that thread as "READY - when the switch 
event has been resolved. That is, when the level two 
miss has been satisfied by obtaining the desired data 
from main memory store. Continuing, th proc ss pass- 
s to block 96, whil proc ssing th switch event as de- 
scribed above, to determine wheth r or not another 
thread is ready for xecution. If so, the process passes 



to block 98 which illustrates th changing of the current 
thread to the thread having the highest priority and a 
"READY" indication. That thread's thread state is then 
set, as depicted within block 102 and the process then 
s returns to block 86. in an iterative fashion as described 
above. 

Referring again to block 96. in accordance with an 
important feature of the present invention, in the event 
another thread within the system does not indicate 

to -READY" the process passes to block 100. Block 100 
illustrates the changing of the current thread to the 
thread which is least recently run. This occurs as a result 
of a decision that the thread which was least recently 
run is the thread most likely to resolve its switch event 

'5 prior to a subsequent thread and thus, delays due to 
memory latency will be minimized by selection of this 
thread as the current thread. The process then passes 
to block 102 which illustrates the setting of the thread 
state for this selected thread and the process then re- 

20 turns to block 86 in an iterative fashion. 

Referring now to Figure 4 there is depicted a high 
level logic flowchart of a process which may be imple- 
mented within the data processing system of Figure 1 
which depicts a preemptive prioritized thread manage- 
rs ment system in accordance with the method and system 
of the present invention. As illustrated, this process be- 
gins at block 110 and thereafter passes to block 112. 
Block 112 illustrates the loading of all threads (o, n - 1 ) 
and the assignment of an associated priority to each 

30 thread. Thereafter, the process passes to block 114. 
Block 114 illustrates the setting of the current thread i 
equal to the thread having the highest priority. The proc- 
ess then passes to block 116 which depicts the execu- 
tion of thread i. 

35 Next, the process passes to block 118. Block 118 
illustrates a determination of whether or not a level two 
cache or translation lookasides uffer miss has occurred 
and if not, the process passes to block 120. Block 120 
illustrates a determination of whether or not a higher pri- 

^0 ority thread has now been indicated as "READY" and if 
not, the process returns to block 11 6 in an iterative fash- 
ion, to continue to execute thread i. 

Referring again to block 118, in the event a level 
two cache or translation lookaside buffer miss has oc- 

*s curred the process passes to block 122. As described 
above, block 122 illustrates a determination of whether 
or not thread switching is enabled and if not, the process 
returns to block 116 in an iterative fashion. Referring 
again to block 118, in the event a level two each or 

so translation lookaside buffer has not occurred, but, as d - 
termined in block 120, a higher priority thread than the 
current thread now indicates "READY" the process also 
passes to block 122. Block 122 thon dotorminos wheth- 
er or not thread switching is enabled and if not, the proc- 

ss ss r turns to block 116 in an iterative fashion. 

Still referring to block 122, in th v nt thr ad 
switching is enabled, and rth r a level two cache or 
translation lookaside buffer miss has occurr d. or a 
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higher priority thr ad than the current thread now indi- 
cates "READY" and thread switching is enabled the 
process passes to block 124. Block 134 illustrat s the 
saving of the state of thread i and the marking of that 
thread as "NOT READY* Next, the process passes to 
block 126. Block 126 illustrates the concurrent process- 
ing of the switch event, if any, and the marking of the 
previously current thread as "READY" when that switch 
event has completed. Of course, those skilled in the art 
will appreciate that in the event the previously current 
thread was suspended in response to a higher priority 
thread indicating a "READY" state no switch event will 
be processed and the previously current thread will be 
marked "READY." Next, the process passes to block 
128. Block 128 illustrates a determination of whether or 
not another thread is ready and, if the process has oc- 
curred as a result of a level two cache or translation 
lookaside buffer miss a determination of the ready state 
of each thread will be required; however in the event the 
thread switch occurs as a resull of a higher priority 
thread indicating a "READY" state then the higher pri- 
ority thread will clearly be available, as determined at 
block 128. Thereafter, the process passes to block 130 
which illustrates the changing of the current thread to 
the thread having the highest priority and an indication 
of "READY." 

Alternately, still referring to block 128, in the event 
the thread switch has occurred as a result of a level two 
cache or translation lookaside buffer miss the process 
passes to block 132. As described above, block 132 il- 
lustrates the changing of the current thread to the thread 
which was least recently run in accordance with the the- 
ory that this thread will be the first thread to achieve a 
"READY" state. Thereafter, the process again passes 
to block 134 which illustrates the setting of the thread 
state for the new current thread and the process then 
returns to block 116, in an iterative fashion, as described 
above. 

With reference now to Figure 5 there is depicted a 
high level logic flowchart which illustrates a process 
which may be implemented within the data processing 
system of Figure 1 which depicts a first thread manage- 
ment system in accordance with the method and system 
of the present invention. As depicted, this process be- 
gins at block 140 and thereafter passes to block 142. 
Block 142 illustrates the loading of an idle loop for each 
thread (0, n - 1 ). Next, the current thread is set i = 0, as 
depicted in block 144. 

The process then passes to block 146 which illus- 
trates the execution of thread i and the process then 
passes to block 148. Block 148 illustrates a determina- 
tion of the occurrence of a switch event while thread 
switching is enabled. If this occurs the process passes 
to block 150 which illustrates the switching of threads 
and th setting of a new curr nt thread. The process 
then returns to block 146, in an it ratrve fashion. 

Referring again to block 148, in the event a deter- 
mination is made that no switch event has occurr d, the 



process passes to block 152. Block 152 illustrates a de- 
termination of wh ther or not a task within the curr nt 
thread has ended and if not. the process returns to block 
146 in an iterative fashion to continue execution. How- 
5 ever, in the event a task has ended the process passes 
to block 154. Block 154 depicts a determination of 
whether or not another task is ready for execution within 
the current thread and if so, the process passes to block 
1 56. Block 1 56 illustrates the loading of the new task for 
io the current thread and this process then returns, in an 
iterative fashion, to block 146 to continue execution of 
the current thread. 

Still referring to block 154, in the event no further 
tasks are ready within the currently executing thread the 
is process passes to block 158. Block 158 illustrates th 
starting of the idle loop for thread i and the process then 
returns, in an iterative fashion, to block 146 to await the 
occurrence of one of the enumerated events. 

Finally, referring to Figure 6 there is depicted a high 

20 level logic flowchart of a process which illustrates a proc- 
ess which may be implemented within the data process- 
ing system of Figure 1 which depicts a second thread 
management system in accordance with the method 
and system of the present invention. As illustrated, this 

25 process begins at block 170 and thereafter passes to 
block 172. Block 172 illustrates the loading of an id! 
loop for each thread (0, n - 1). Thereafter, as depicted 
within block 174, the current thread i is set = 0. Next, th 
process passes to block 176. Block 176 illustrates th 

30 marking of the current thread as "VALID" and the mark- 
ing of all other threads as "NOT VALID." The process 
then passes to block 178. Block 178 illustrates the exe- 
cution of thread i. 

Thereafter, as depicted in block 180 in the event a 

3S determination is made as to whether or not a switch 
event has occurred while switching is enabled. If so, the 
process passes to block 182. Block 182 indicates a de- 
termination of whether or not another thread within th 
system is "VALID." If not, the process returns to block 

40 178, in an iterative fashion, to continue execution of 
thread i. Alternately, in the event another thread is de- 
termined as "VALID" The process passes to block 184. 
Block 184 illustrates the switching of the current thread 
to the new thread chosen from among those threads in- 

45 dicating "VALID" state. The process then returns to 
block 178 to execute the new current thread in the man- 
ner described above. 

Referring again to block 180 in the event a determi- 
nation is made that no switch event has occurred or that 

so switching is not enabled, the process passes to block 
186. Block 186 illustrates a determination of whether or 
not the current task has ended and if so, the process 
passes to block 188. Block 188 illustrates a determina- 
tion of whether or not another task within the curr nt 

55 thread is ready for ex cution and if so, th proc ss pass- 
es to block 190. Block 190 illustrates the loading of the 
new task for the current thr ad and the process then 
returns to block 178. in an it ratrve fashion to continue 
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xecution of the curr nt thread. 

Referring again to block 188, in the event a current 
task has ended, as determined at block 186. and a sub- 
sequent task is not ready the process passes to block 
194. Block 194 illustrates a determination of whether or 
not any other tread within the system indicates 'VALID. 
' If not, the process passes to block 196 which illustrates 
the starting of the idle loop for thread i and the process 
then returns to block 178, in an iterative fashion. How- 
ever, in the event another thread within the system indi- 
cates "VALID" the process passes form block 194 to 
block 200. Block 200 illustrates the marking of the cur- 
rent thread as 'NOT VALID" and the process then re- 
turns to block 1 84 to change the current thread to a new 
thread chosen from among the valid threads. 

Referring again to block 186, in the event the cur- 
rent task is not ended, the process passes to block 192. 
Block 192 illustrates a determination of whether or not 
a new task has become ready and if not. the process 
returns to block 178, in an iterative fashion to continue 
the execution of thread i in the manner described above. 
However, in the event a new task has become ready, as 
determined at block 192, the process passes to block 
198. Block 198 illustrates a determination of whether or 
not any "NOT VALID" threads are present among the 
threads within the system and if not, the process returns 
to block 178, in an iterative fashion, to continue to exe- 
cute thread i. However, in the event a "NOT VALID" 
thread is present within the system the process passes 
to block 202. Block 202 illustrates the selection of one 
"NOT VALID" thread, the marking of that thread as "VAL- 
ID" and the loading of the task which is now ready into 
that thread. The process then returns to block 178, in 
an iterative fashion, to continue to execute i. Thereafter, 
in the event a thread switch event occurs, a "VALID" 
thread having the new task present therein is ready for 
execution. 



Claims 

1. A method for enhanced performance multithread 
operation in a data processing system which in- 
cludes a processor, a main memory store and at 
least two levels of cache memory, said method com- 
prising the steps of: 

executing at least one instruction within a first 
thread; 

thereafter storing a state of said processor at a 
selected point within said first thread, terminat- 
ing execution of said first thread and switching 
execution to a second thread only in response 
to an occurrence of an id ntifi d vent having 
a delay associated ther with which xc dsan 
amount of time r quired for a thread switch; and 



ex cuting at least one instruction within said 
second thr ad wherein processing delays due 
to memory access latency are minimized. 

5 2. The method according to Claim 1 , further including 
the steps of: 

storing an indication of non-validity in associa- 
tion with said first thread in response to said oc- 
io currence of said identified event, and 

removing said indication of non-validity in as- 
sociation with said first thread following a com- 
pletion of said identified event. 

75 

3. The method according to Claim 1 , wherein said data 
processing system includes a plurality of registers 
and wherein said step of storing a state of said proc- 
essor at a selected point within said first thread 

20 comprises storing a state of said processor al a se- 
lected point within said first thread within a register 
associated with said first thread. 

4. The method according to anyone of Claim 1 to 3, 
25 further including the step of determining a validity 

status for each thread within said data proc ssing 
system and selecting a second thread for execution 
in response to said determination, 

30 selecting a least recently executed thread for 

execution in response to an indication of non- 
validity in association with all remaining threads 
within said data processing system following 
said occurrence of said identified event, and 

35 

selectively inhibiting execution of a subsequent 
thread within said data processing system in re- 
sponse to a state of a switch enable bit. 

40 5. The method according to anyone of Claim 1 to 4, 
further including the step of selecting said second 
thread for execution following said occurrence of 
said identified event in response to a priority indica- 
tion associated with each thread within said data 
45 processing system. 

6. The method for enhanced performance multithread 
operation in a data processing system according to 
anyone of the previous claims 

so 

wherein said switching execution is performed 
in response to a level two or higher cache miss; 
and 

55 further comprises the step of: 

maintaining an address indication for said 
level two or higher cache miss before executing 
said at least one instruction. 



35 



40 5. 



15 



20 
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7. A system for enhanced p rformanc multithread 
operation in a data processing system which in- 
clude a processor, a main memory store and at least 
two levels of cache memory, said system compris- 
ing: 

means for executing at least one instruction 
within a first thread; 

means for thereafter storing a state of said 
processor at a selected point within said first 
thread, terminating execution of said first 
thread and switching execution to a second 
thread only in response to a level two or higher 
cache miss; 

means for maintaining an address indication for 
said level two or higher cache miss; and 



means for selectively inhibiting x cution of a 
subsequent thread within said data processing 
system in response to a state of a switch enable 
bit. and 

5 

means for selecting said second thread for ex- 
ecution following said level two or higher cache 
miss in response to a priority indication associ- 
ated with each thread within said data process- 
J0 jng system. 

11. A computer program product comprising a system 
for enhanced multithread operation in a data 
processing system according to any one of claims 
is 7 to 10. 



w 



means for executing al least one instruction 20 
within said second thread wherein processing 
delays due to memory access latency are min- 
imized. 

8. Tho system according to Claim 7, further including: 2s 

means for storing an indication of non -validity 
in association with said first thread in response 
to said level two or higher cache miss. 

30 

means for removing said indication of non-va- 
lidity in association with said first thread follow- 
ing a retrieval from main memory of data or in- 
struction at said maintained address indication 
for said level two or higher cache miss. 35 

9. . The system according to Claim 7, wherein said data 

processing system includes a plurality of registers 
and wherein said means for storing a state of said 
processor at a selected point within said first thread *o 
comprises storing a state of said processor at a se- 
lected point within said first thread within a register 
associated with said first thread. 

10. The system according to Claim 7, 8 or 9 further in- *5 
eluding: 

means for determining a validity status for each 
thread within said data processing system and 
selecting a second thread for execution in re- so 
sponse to said determination. 

means for selecting a least recently oxocutcd 
thread for execution in response to an indica- 
tion of non-validity in association with all re- 55 
maining threads within said data processing 
system following said I vel two or higher cache 
miss. 
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